--- category: projects --- Creating the geom_area Plot

Goal

To match the plot of the MnDOT, I’ll use an area plot, so I use the same code but instead of geom_point(), I’ll use geom_area(). I’ll also need to plot two different sets of data on one plot: ‘2017-2019 Daily Average’ and ‘2020 Daily Total’. To do this, I’ll create a new view in SQL that calculates 2017-2019 daily averages in one column and includes the 2020 counts in a second new column. However, before I do that, the MnDOT report combined the ‘In’ and ‘Out’ directions for each day, so I’ll start by doing that.

Connect to MySQL Database

Combining ‘In’ and ‘Out’ Values

Right now, the table looks like this:

counts_daily <- tbl(con, 'counts_daily')
head(counts_daily)
## # Source:   lazy query [?? x 9]
## # Database: mysql [root@:/counts]
##   bikeometer_id date       direction   count is_weekend  year month   day
##           <int> <date>     <chr>     <int64>      <int> <int> <int> <int>
## 1            10 2011-09-02 I             343          0  2011     9     2
## 2            10 2011-09-02 O             312          0  2011     9     2
## 3            10 2011-09-03 I             344          1  2011     9     3
## 4            10 2011-09-03 O             354          1  2011     9     3
## 5            10 2011-09-04 I             352          1  2011     9     4
## 6            10 2011-09-04 O             348          1  2011     9     4
## # ... with 1 more variable: month_day <chr>

In MySQL, I’ll create a new view in MySQL that takes the sum of both the ‘I’ and ‘O’ values and group them by ‘date’ and ‘bikeometer_id’.

For those not familiar with MySQL views, they look like tables but are actually recalculated each them they are accessed. This is what I want, because my python program imports the new Bike Arlington data into my ‘counts_daily’ MySQL table. Each time the ‘counts_daily_total’ view is accessed, it will automatically update to account for new data added to the ‘counts_daily’ table. Thus, I don’t need to change my python program to generate new and useful tables in MySQL.

Now, let’s take a look at our new view.

counts_daily_total <- tbl(con, 'counts_daily_total')
head(counts_daily_total)
## # Source:   lazy query [?? x 5]
## # Database: mysql [root@:/counts]
##   bikeometer_id date       count is_weekend month_day
##           <int> <date>     <dbl>      <int> <chr>    
## 1            10 2011-09-02   655          0 09-02    
## 2            10 2011-09-03   698          1 09-03    
## 3            10 2011-09-04   700          1 09-04    
## 4            10 2011-09-05   459          0 09-05    
## 5            10 2011-09-06   211          0 09-06    
## 6            10 2011-09-07   194          0 09-07

As you can see, the ‘I’ and ‘O’ directions are combined!

Filtering for Date Range and Bikeometer ID

In a previous post, I chose a time-frame and which Bikeometers were best to plot. So I’ll filter by creating a new view in MySQL called ‘vw_filtered_counts_daily_total_2017_to_2019’.

filtered_counts_daily_total_2017_to_2019 <- tbl(con, 'filtered_counts_daily_total_2017_to_2019')
filtered_counts_daily_total_2017_to_2019
## # Source:   table<filtered_counts_daily_total_2017_to_2019> [?? x 5]
## # Database: mysql [root@:/counts]
##    bikeometer_id date       count is_weekend month_day
##            <int> <date>     <dbl>      <int> <chr>    
##  1            18 2017-01-01    52          1 01-01    
##  2            18 2017-01-02    14          0 01-02    
##  3            18 2017-01-03    43          0 01-03    
##  4            18 2017-01-04   116          0 01-04    
##  5            18 2017-01-05    74          0 01-05    
##  6            18 2017-01-06    51          0 01-06    
##  7            18 2017-01-07     7          1 01-07    
##  8            18 2017-01-08    15          1 01-08    
##  9            18 2017-01-09    49          0 01-09    
## 10            18 2017-01-10    63          0 01-10    
## # ... with more rows

Creating the 2017-2019 Daily Average column

Adding a year column (not needed)

df <- tbl(con, 'the_best_table')
# These Bikeometers are all located in the Arlington region

In order for this table to be easily read by R, we need to covert it from a wide table to a long table. This is the resource I used to understand how to do the conversion.

df.long <- pivot_longer(df, cols=2:3, names_to='year', values_to='counts')

Here I added the year to the end of the all the month_day values then convert that to a date class so R has an easy time graphing it.

df.long <- mutate(df.long, month_day=paste(month_day, '-2020', sep = ''))
df.long <- collect(df.long)

Next, I converted each character string in the month_day column into a date class.

#```{r} df.long$month_day <- mutate(month_day = as.Date(month_day, format=‘%m/%d/%Y’))

df.long

#```

Then I’ll filter for the dates in my range

df.long.filtered <- df.long %>% filter(month_day >= '03-12-2020' & month_day <= '05-15-2020') %>% collect()
df.long.filtered
## # A tibble: 130 x 3
##    month_day  year                 counts
##    <chr>      <chr>                 <dbl>
##  1 03-12-2020 average_2017_to_2019   88.4
##  2 03-13-2020 average_2017_to_2019  101. 
##  3 03-14-2020 average_2017_to_2019   93.7
##  4 03-15-2020 average_2017_to_2019   83.8
##  5 03-16-2020 average_2017_to_2019   63.9
##  6 03-17-2020 average_2017_to_2019   55.2
##  7 03-18-2020 average_2017_to_2019   97  
##  8 03-19-2020 average_2017_to_2019  110. 
##  9 03-20-2020 average_2017_to_2019  108. 
## 10 03-21-2020 average_2017_to_2019   71.8
## # ... with 120 more rows

Finally I convert the month_day values from a ‘character class’ to a time class. Be sure to lowercase the ‘m’ and ‘d’ in ‘%m-%d-%Y’.

df.long.filtered$month_day <- strptime(df.long.filtered$month_day, '%m-%d-%Y')
class(df.long.filtered$month_day[1])
## [1] "POSIXlt" "POSIXt"
df.long.filtered$month_day <- as.Date(df.long.filtered$month_day)
class(df.long.filtered$month_day[1])
## [1] "Date"
#library(ggpmisc)
p <- df.long.filtered %>% ggplot(aes(x = month_day, y = counts)) +
  geom_line(aes(group = year, color = year))
  #scale_x_date()
  #stat_peaks(span = 30)
print(p)

p <- df.long.filtered %>% ggplot(aes(x = month_day, y = counts)) +
  geom_area(aes(group = year, color = year))
print(p)

It looks like there definitely was an increase but it doesn’t look as nice as the MnDOT plot. Let’s investigate this a little.

Bikeometer Locations

I’ll map the locations of the Bikeometers to see where exactly they are in Arlington. To accomplish this, I found this video by Professor Lisa Lendway very helpful along with her website.

On the left are the numbers we need to enter into the bbox variable below in order to get the graph from Stamen Maps. I’ve chosen ‘terrain’ as my map type and the ‘zoom’ will be 13, which is in the openstreetmap url just after the text ‘map=’.

library(ggmap)
## Warning: package 'ggmap' was built under R version 4.0.5
## Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.
## Please cite ggmap if you use it! See citation("ggmap") for details.
library(ggthemes)
## Warning: package 'ggthemes' was built under R version 4.0.4
# Get the map information
arlington <- get_stamenmap(
    bbox = c(left = -77.2377, bottom = 38.8075, right = -76.9744, top = 38.9080), 
    maptype = "terrain",
    zoom = 12)
## Source : http://tile.stamen.com/terrain/12/1169/1566.png
## Source : http://tile.stamen.com/terrain/12/1170/1566.png
## Source : http://tile.stamen.com/terrain/12/1171/1566.png
## Source : http://tile.stamen.com/terrain/12/1172/1566.png
## Source : http://tile.stamen.com/terrain/12/1169/1567.png
## Source : http://tile.stamen.com/terrain/12/1170/1567.png
## Source : http://tile.stamen.com/terrain/12/1171/1567.png
## Source : http://tile.stamen.com/terrain/12/1172/1567.png
## Source : http://tile.stamen.com/terrain/12/1169/1568.png
## Source : http://tile.stamen.com/terrain/12/1170/1568.png
## Source : http://tile.stamen.com/terrain/12/1171/1568.png
## Source : http://tile.stamen.com/terrain/12/1172/1568.png

Next we will import the Bikeometer data from my database that contains the longitude and latitudes needed to plot the Bikeometers.

sql_cmd <- "SELECT * FROM counts.bikeometer_details WHERE bikeometer_id in (14,15,16,18,22,31,39)"
# creates a lazy table
bikeometer_table <- dbGetQuery(con, sql_cmd)
bikeometer_table
##   bikeometer_id                  name  latitude  longitude     region region_id
## 1            18  Crystal NB bike lane 38.857315 -77.049152  Arlington         1
## 2            14  Fairfax EB bike lane 38.882767 -77.104615  Arlington         1
## 3            22 Military NB bike lane 38.905509 -77.109349  Arlington         1
## 4            39 Potomac Yard Trail #1 38.829850 -77.047850 Alexandria         2
## 5            16   Quincy NB bike lane 38.885457 -77.108055  Arlington         1
## 6            15   Quincy SB bike lane 38.884877 -77.108075  Arlington         1
## 7            31      Roosevelt Bridge 38.893822 -77.065737  Arlington         1

One Bikeometer stands out as not even being in Alrington… how did that get in there? Let’s plot it anyway to see how close it is to Arlington. Realistically, I frequently bike into Alexandria on the weekends but I’ll probably end up excluding it none the less.

# Convert latitude and longitude to integers
bikeometer_table$longitude <- as.numeric(bikeometer_table$longitude)
bikeometer_table$latitude <- as.numeric(bikeometer_table$latitude)

# Plot the points on the map
ggmap(arlington) + # creates the map "background"
  geom_point(data = bikeometer_table, 
             aes(x = longitude, y = latitude, color = region), 
             alpha = 0.5, 
             size = 3) +
             theme_map()

From my experience, the most popular bike trail in Arlington will be on the Arlington Loop. I want to get a sense if these Bikeometers are on the Arlington Loop, and if not, choose Bikeometers that are on the loop. My hypothesis is that bike trails that weren’t popular before the pandemic won’t see much change after the pandemic. I generally either ride the Arlington Loop or the W&OD Trails as they are the most bike-friendly. A trail that is less bike-friendly before the pandemic won’t become that much more bike-friendly during a pandemic, even if there is a reduction of traffic in the city.

I’ll use the GPS coordinates from my Garmin GPS watch to create a layer for my plot that shows the Arlington Loop trail. I downloaded the gpx file from the Garmin Connect website.